091-2230-8145     |      dataprojectng@gmail.com

CLUSTERING NEWS ARTICLES USING K-MEANS AND N-GRAMS

  • Project Research
  • 1-5 Chapters
  • Abstract : Available
  • Table of Content: Available
  • Reference Style: APA
  • Recommended for : Student Researchers
  • NGN 3000

ABSTRACT

Document clustering is an automatic unsupervised machine learning technique that aimed at grouping related set of items into clusters or subsets. The target is creating clusters with high internal coherence, but different from each other substantially. Simply, items within the same cluster should be highly similar, while maintaining high dissimilarity with items within other clusters. Automatic clustering of documents has played a very significant role in many fields including data mining and information retrieval. This thesis aimed to improve the overall efficiency of a document clustering technique using N-grams and efficient similarity measure. The thesis improves the purity and accuracy of the obtained clusters. The preprocessing method is based on N-grams (sequence of N consecutive characters) which do not give consideration to stop-words or other special punctuations but creates and overlap among the content of a document which further gives room to ignore errors thereby increasing the quality of the clusters to a great extent. This approach clusters the news articles based on their N-grams representation, thereby reducing noise and increase the probability of occurrences of the sequences within the articles document. The proposed clustering technique has parameters which can be changed accordingly at the document representation level in order to improve the efficiency and quality of the generated clusters. The results from the experiment using R programming environment were carried out on real datasets of the Reuters21578 and 20Newsgropus proved the effectiveness of the proposed clustering technique at different levels of N-grams in terms of the accuracy and purity of the generated clusters. The results also showed that the proposed clustering technique perform averagely better than the baseline technique both in terms of accuracy and purity with a best results when the window of N-grams = 3.




FIND OTHER RELATED TOPICS


Related Project Materials

TAX REVENUE AND INFRASTRUCTURAL DEVELOPMENT IN NIGERIA (1994–2017)

Background To The Study

Infrastructure is very significant to a country’s developmental prospect,...

Read more
EFFECTS OF INSTRUCTIONAL MATERIALS ON ACHIEVEMENT AND RETENTION OF BIOLOGY CONCEPTS AMONG SECONDARY SCHOOL STUDENTS

EXCERPT FROM THE STUDY

Effective use of instructional materials and its relevance with the topic would enable the learne...

Read more
QUALITY ASSURANCE STANDARDS AND THEIR ENFORCEMENT IN PRIVATE NURSERY AND PRIMARY SCHOOLS

ABSTRACT

The core function of Quality Assurance Standard is to ensure compliance of private and public schools to the st...

Read more
EFFECTS OF ACCOUNTING INFORMATION SYSTEM ON PROFITABILITY OF A COMPANY

ABSTRACT

The study examined effects of accounting information system on profitability of manufacturing company, a case study of Cadbury N...

Read more
STRATEGIES TO ENHANCE COMPUTER SCIENCE EDUCATION IN COLLEGE OF EDUCATION

BACKGROUND OF THE STUDY

Education is one of the basic rights of every individual. Education prepares a...

Read more
MONETARY POLICY AND THE BANKING PERFORMANCE IN NIGERIA

Background to the study

There are several factors that affect the per...

Read more
INVESTIGATION OF BAD ROAD AND ITS EFFECT TO THE ECONOMIC DEVELOPMENT : A CASE STUDY OF IMO STATE

Background of the study

The capacity of a nation to make productive and efficient use of the resources...

Read more
EDUCATIONAL TECHNOLOGY FOR LIFELONG LEARNING AND SUSTAINABLE DEVELOPMENT

Background to the study

The paper discussed the concepts educational technology, lifelong learning and...

Read more
THE IMPACT OF DEPRESSED ECONOMY ON REAL ESTATE FINANCE

 Abstract

This study is on the impact of depressed economy on real estate finance. The total population for the stu...

Read more
SOCIO-RELIGIOUS DISCOURSE OF THE CONUNDRUMS OF TRADITIONAL EDUCATION IN CONTEMPORARY IGBO SOCIETY, NIGERIA

ABSTRACT

The socio-religious discourse of the conundrums of traditional education in contemporary Igbo society is based on the premise th...

Read more
Share this page with your friends




whatsapp